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Abstract. Model reduction attempts to guarantee a desired “model quality”, e.g. given in 
terms of accuracy requirements, with as small a model size as possible. This article high¬ 
lights some recent developments concerning this issue for the so called Reduced Basis Method 
(RBM) for models based on parameter dependent families of PDEs. In this context the key 
task is to sample the solution manifold at judiciously chosen parameter values usually deter¬ 
mined in a greedy fashion. The corresponding space growth concepts are closely related to so 
called weak greedy algorithms in Hilbert and Banach spaces which can be shown to give rise 
to convergence rates comparable to the best possible rates, namely the Kolmogorov n-widths 
rates. Such algorithms can be interpreted as adaptive sampling strategies for approximating 
compact sets in Hilbert spaces. We briefly discuss the results most relevant for the present 
RBM context. The applicability of the results for weak greedy algorithms has however been 
confined so far essentially to well-conditioned coercive problems. A critical issue is therefore 
an extension of these concepts to a wider range of problem classes for which the conventional 
methods do not work well. A second main topic of this article is therefore to outline recent 
developments of RBMs that do realize n-width rates for a much wider class of variational 
problems covering indefinite or singularly perturbed unsymmetric problems. A key element 
in this context is the design of well-conditioned variational formulations and their numerical 
treatment via saddle point formulations. We conclude with some remarks concerning the 
relevance of uniformly approximating the whole solution manifold also when the quantity of 
interest is only of a functional of the parameter dependent solutions. 


1. Introduction 

Many engineering applications revolve around the task of identifying a configuration that 
in some sense best fits certain objective criteria under certain constraints. Such design or 
optimization problems typically involve (sometimes many) parameters that need to be chosen 
so as to satisfy given optimality criteria. An optimization over such a parameter domain 
usually requires a frequent evaluation of the states under consideration which typically means 
to frequently solve a parameter dependent family of operator equations 

(1-1) B y u = f, y € y. 

In what follows the parameter set y is always assumed to be a compact subset of R p for 
some fixed pgN and B y should be thought of as a (linear) partial differential operator whose 
coefficients depend on the parameters y £ y. Moreover, B y is viewed as an operator taking 
some Hilbert space U one-to-one and onto the normed dual V' of some (appropriate) Hilbert 
space V where U and V are identified through a variational formulation of 0 as detailed 
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later, see for instance (4.2). Recall also that the normed dual V' is endowed with the norm 

(w,v) 


( 1 . 2 ) 


M\v ~ sup 

vev,v^o IMIv 


where (■, ■) denotes the dual pairing between V and V'. 

Given a parametric model (1.1) the above mentioned design or optimization problems concern 
now the states u(y) £ U which, as a function of the parameters y £ y, form what we refer to 
as the solution manifold 


(1.3) M:={u(y):=B~ 1 f:yey}. 

Examples of < 0 > arise, for instance, in geometry optimization when a transformation of a 
variable finitely parametrized domain to a reference domain introduces parameter dependent 
coefficients of the underyling partial differential equation (PDE) over such domains, see e.g. [Hj . 
Parameters could describe conductivity, viscosity or convection directions, see e.g. [lOl l28l 124] . 
As an extreme case, parametrizing the random diffusion coefficients in a stochastic PDE e.g., 
by Karhunen-Loew or polynomial chaos expansions, leads to a deterministic parametric PDE 
involving, in principle, even infinitely many parameters, p = oo, see e.g. [8] and the literature 
cited there. We will, however, not treat this particular problem class here any further since, as 
will be explained later, it poses different conceptual obstructions than those in the focus of this 
paper, namely the absence of ellipticity which makes conventional strategies fail. In particular, 
we shall explain why for other relevant problem classes, e.g. those dominated by transport 
processes, M is not “as visible” as for elliptic problems and how to restore “full visibility”. 


1.1. General Context - Reduced Basis Method. A conventional way of searching for a 
specific state in At or optimize over At is to compute approximate solutions of (1.1) possibly 
for a large number of parameters y. Such approximations would then reside in a sufficiently 
large trial space Ujy C U of dimension J\f, typically a finite element space. Ideally one would 
try to assure that Ujy is large enough to warrant sufficient accuracy of whatever conclusions 
are to be drawn from such a discretization. A common terminology in reduced order modeling 
refers to Ujy as the truth space providing accurate computable information. Of course, each 
such parameter query in Ujy is a computationally expensive task so that many such queries, 
especially in an online context, would be practically infeasible. On the other hand, solving for 
each y £ y a problem in Ujg would just treat each solution u(y) as some “point” in the infinite 
dimensional space (7, viz. in the very large finite dimensional space Ujy. This disregards the 
fact that all these points actually belong to a possibly much thinner and more coherent set, 
namely the low dimensional manifold A4 which, for compact y and well posed problems 
is compact. Moreover, if the solutions u(y), as functions of y £ y, depend smoothly on y 
there is hope that one can approximate all elements of A4 uniformly over y with respect to the 
Hilbert space norm || • ||[/ by a relatively small but judiceously chosen linear space U n . Here 
“small” means that n = dim U n is significantly smaller than A f = dim L(a/, often by orders of 
magnitude. As detailed later the classical notion of Kolmogorov n-widths quantifies how well a 
compact set in a Banach space can be approximated in the corresponding Banach norm by a 
linear space and therefore can be used as a benchmark for the effectiveness of a model reduction 
strategy. 

Specifically, the core objective of the Reduced Basis Method (RBM) is to find for a given 
target accuracy e a possibly small number n = n(e) of basis functions <pj,j = 0,... ,n, whose 
linear combinations approximate each u £ A4 within accuracy at least e. This means that 
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ideally for each y £ y one can find coefficients Cj(y) such that the expansion 

n(e) 


(1.4) 

u n {x, y) :='Y^Cj{y)<l> j (x) 


0=0 

satisfies 


(1.5) 

II u{y) - u n (y)\\u < s, y£y 


Thus, projecting (1.1) into the small space U n := span {<^>o, • • •, <j>n} reduces each parameter 
query to solving a small n x n system of equations where typically n -C A f. 

1.2. Goal Orientation. Recall that the actual goal of reduced modeling is often not to recover 
the full fields u(y) £ M. but only some quantity of interest I(y) typically given as a functional 
I{y) := i(u(y)) of u(y) where l £ U' . Asking just the value of such a functional is possibly 
a weaker request than approximating all of u(y) in the norm || • ||[/. In other words, one may 


have \£(u(y)) — £(u n (y))\ < e without insisting on the validity of (1.5) for a tolerance of roughly 
the same size. Of course, one would like to exploit this in favor of online efficiency. Duality 
methods as used in the context of goal-oriented, finite element methods [3] are indeed known to 
offer ways of economizing the approximate evaluation of functionals. Such concepts apply in 
the RBM context as well, see e.g. (2H US]. However, as we shall point out later, guaranteeing 
that | £(u(y)) — £(u n (y))\ < e holds for y £ y, ultimately reduces to tasks of the type (1.5) as 


well. So, in summary, understanding how to ensure (1.5) for possibly small n(e) remains the 


core issue and therefore guides the subsequent discussions. 

Postponing for a moment the issue of how to actually compute the tfrj , it is clear that they 
should intrinsically depend on A4 rendering the whole process highly nonlinear. To put the 
above approach first into perspective, viewing u{x,y ) as a function of the spatial variables x 


and of the parameters y , (1.4) is just separation of variables where the factors Cj (y), <f>j ( x) are a 
priori unknown. It is perhaps worth stressing though that, in contrast to other attempts to find 
good tensor approximation, in the RBM context explicit representations are only computed for 
the spatial factors (f>j while for each y the weight Cj (y) has to be computed by solving a small 
system in the reduced space U n . Thus, the computation of {fo, ..., <j) n (e)} could be interpreted 
as dictionary learning and, loosely speaking, n = n(e) being relatively small for a given target 
accuracy, means that all elements in M. are approximately sparse with respect to the dictionary 
{ 00 ? • • • ) 4>rn ***}". 

The methodology just outlined has been pioneered by Y. Maday, T.A. Patera and collab¬ 
orators, see e.g. B HD M HU- As indicated before, RBM is one variant of a model oder 
reduction paradigm that is specially tailored to parameter dependent problems. Among its 
distinguishing constituents one can name the following. There is usually a careful division of 
the overall computational work into an offline phase, which could be computationally intense 
but should remain managable, and an online phase which should be executable with highest 
efficiency taking advantage of a precomputed basis and matrix assemblations during the offline 
phase. It is important to note that while the offline phase is accepted to be computationally 
expensive it should remain offline-feasible in the sense that a possibly extensive search over 
the parameter domain y in the offline phase requires for each query solving only problems in 
the small reduced space. Under which circumstances this is possible and how to realize such 
division concepts has been worked out in the literature, see e.g. [241 ! 23], Here we are content 
with stressing that an important role is played by the way how the operator B y depends on 


the parameter y, namely in an affine way as stated in (2.13) later below. Second, and this is 
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perhaps the most distinguishing constituent, along with each solution in the reduced model one 
strives to provide a certificate of accuracy, i.e., computed bounds for incurred error tolerances 

[22 El,. 


1.3. Central Objectives. When trying to quantify the performance of such methods aside 
from the above mentioned structural and data organization aspects, among others, the following 
questions come to mind: 


(i) for which type of problems do such methods work very well in the sense that n(e) in (1.51 
grows only slowly when e decreases? This concerns quantifying the sparsity of solutions. 

(ii) How can one compute reduced bases {fio ,..., <fi n ( e )} for which n(e) is nearly minimal in 
a sense to be made precise below? 

Of course, the better the sparsity quantified by (i) the better could be the pay-off of an 
RBM. However, as one my expect, an answer to (i) depends strongly on the problem under 
consideration. This is illustrated also by the example presented in 15.4 Question (ii), instead, 


can be addressed independently of (i) in the sense that, no matter how many basis functions 
have to be computed in order to meet a given target accuracy, can one come up with methods 
that guarantee generating a nearly minimal number of such basis functions? This has to do 
with how to sample the solution manifold and is the central theme in this paper. 

The most prominent way of generating the reduced bases is a certain greedy sampling of the 
manifold A4. Contriving greedy sampling strategies that give rise to reduced bases of nearly 
minimal length, in a sense to be made precise below, also for non-coercive or unsymmetric 
singularly perturbed problems is the central objective in this paper. We remark though that a 
greedy parameter search in its standard form is perhaps not suitable for very high dimensional 
parameter spaces without taking additional structural features of the problem into account. 
The subsequent discussions do therefore not target specifically the large amount of recent work 
on stochastic elliptic PDEs, since while greedy concepts are in principle well understood for 
elliptic problems they are per se not necessarily adequate for infinitely many parameters without 
exploiting specific problem dependent structural information. 

First, we recall in fj2]a greedy space growth paradigm commonly used in all established RBMs. 
To measure its performance in the sense of (ii) we follow [B] and compare the corresponding 
distances dist[/(A4, U n ) to the smallest possible distances achievable by linear spaces of dimen¬ 
sion n, called Kolmogorov n-widths. The fact that for elliptic problems the convergence rates 
for the greedy errors are essentially those of the n-width, and hence rate-optimal , is shown in 
m to be ultimately reduced to analyzing so called weak greedy algorithms in Hilbert spaces, see 
also 12 nsj. However, for indefinite or strongly unsymmetric and singularly perturbed problems 
this method usually operates far from optimality. We explain why this is the case and describe 
in q2a remedy proposed in (10] . A pivotal role is played by certain well-conditioned variational 


formulations of (1.1) which are then shown to lead again to an optimal outer greedy sampling 
strategy also for non-clliptic problems. An essential additional ingredient consists of certain 
stabilizing inner greedy loops, see fjs] The obtained rate-optimal scheme is illustrated by a 
numerical example addressing convection dominated convection-diffusion problems in (5.4 We 
conclude in (JBlwith applying these concepts to the efficient evaluation of quantities of interest. 


2. The Greedy Paradigm 


The by far most prominent strategy for constructing reduced bases for a given parameter 
dependent problem (1.1) is the following greedy procedure, see e.g. |22J- The basic idea is 
that, having already constructed a reduced space U n 


see e.g. [, 
of dimension n , find an element u n + \ = 
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u(y n + 1 ) in A4 that is farthest away from the current space U n , i.e., that maximizes the best 
approximation error from U n and then grow U n by setting U n +\ := U n + span {u ra+ i}. Hence, 
denoting by Pu,u n the [/-orthogonal projection onto U n , 

(2.1) y n+ 1 := argmax \\u(y) - Pu,u n u(y)\\u, u n+1 := u(y n+1 ). 

y&y 


Unfortunately, determining such an exact maximizer is computationally way too expensive even 
in an offline phase because one would have to compute for a sufficiently dense sampling of y 
the exact solution u(y) of (1.1) in U (in practice in Ujg-). Instead one tries to construct more 
efficiently computable surrogates R(y, U n ) satisfying 


( 2 . 2 ) 


I \u{y) - Pu,u„u{y)\\u < R{y,U n ), y &y. 


Recall that “efficiently computable” in the sense of offline-feasibility means that for each y £ y, 
the surrogate R(y, U n ) can be evaluated by solving only a problem of size n in the reduced space 
U n . Deferring an explanation of the nature of such surrogates, Algorithm [l] described below 
is a typical offline-feasible surrogate based greedy algorithm (SGA). Clearly, the maximizer in 
(2.3) below is not necessarily unique. In case several maximizers exist it does not matter which 
one is selected. 


Algorithm 1 surrogate based greedy algorithm 

1: function SGA 

2 : Set Uo := {0}, n = 0, 

3: while argmax, yg .y R(y, XJ n ) > tol do 

4: 

Vn +1 := argma xR(y,U n ), 
yey 

( 2 - 3 ) u n+1 := u(y n+1 ), 

U n+ 1 := span {[/„, {u(y n+ i)}} = span {m ,..., u n+1 } 

5: end while 
6: end function 


Strictly speaking, the scheme SGA is still idealized since: 

(a) computations cannot be carried out in U; 

(b) one cannot parse through all of a continuum y to maximize R(y , U n ). 

Concerning (a), as mentioned earlier computations in U are to be understood as synonymous 
to computations in a sufficiently large truth space U\r satisfying all targeted accuracy tolerances 
for the underlying application. Solving problems in Ujy is strictly confined to the offline phase 
and the number of such solves should remain of the order of n = dim U n . We will not distinguish 
in what follows between U and Ujy unless such a distinction matters. 

As for (b), the maximization is usually performed with the aid of a complete search over a 
discrete subset of y. Again, we will not distinguish between a possibly continuous parameter 
set and a suitable training subset. In fact, continuous optimization methods that would avoid 
a complete search have so far not proven to work well since each greedy step increases the 
number of local maxima of the objective functional. Now, how fine such a discretization for 
a complete search should be depends on how smoothly the u(y) depend on y. But even when 
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such a dependence is very smooth a coarse discretization of a high-dimensional parameter set 
y would render a complete search infeasible so that, depending on the problem at hand, one 
has to resort to alternate strategies such as, for instance, random sampling. However, since it 
seems that (b) can only be answered for a specific problem class we will not address this issue 
in this paper any further. 

Instead, we focus on general principles which guarantee the following. Loosely speaking the 
reduced spaces based on sampling A4 should perform optimally in the sense that the resulting 
spaces U n have the (near) “smallest dimension” needed to satisfy a given target tolerance while 
the involved offline and online cost remains feasible in the sense indicated above. To explain 
first what is meant by “optimal” let us denote the greedy error produced by SGA as 


(2.4) 


&n{M)u '■= max inf ||i> — u\ 

vEA 4 uEUn 


u = max 
yey 


\u(y) - Pu,u n u(y)\\u- 


Note that if we replaced in \2A\ the space U n by any linear subspace W n C U and infimize the 
resulting distortion over all subspaces of U of dimension at most n, one obtains the classical 
Kolmogorov n-widths d n (A4)u quantifying the “thickness” of a compact set, see (3.2). One 
trivially has 

(2.5) d n (M)u < cr n (M)u, n £ N. 


Of course, it would be best if one could reverse the above inequality. We will discuss in the 
next section to what extent this is possible. 

To prepare for such a discussion we need more information about how the surrogate R(y , U n ) 
relates to the actual error ||u(y) — Pu,u n u (y)\\u because the surrogate drives the greedy search 
and one expects that the quality of the snapshots found in SGA depends on how “tight” the 
upper bound in (2.2) is. 

To identify next the essential conditions on a “good” surrogate it is instructive to consider 
the case of elliptic problems. To this end, suppose that 

(B y u,v) =b y (u,v) = (f,v), u,v€U , 

is a uniformly H-coercive bounded bilinear form and / £ U ', i.e., there exist constants 0 < C\ < 
C\ < oo such that 

(2.6) ci|M| u<b y {v,v), \b y (u,v)\ < Ci|M|t/|Mli/> u,v £ U, y £ y, 


holds uniformly in y £ Jb The operator equation (1.1) is then equivalent to: given / £ U' and 
a y £ y, find u(y) £ U such that 


(2.7) 


b v (u(y), v ) = (f,v), v £ U. 


Ellipticity has two important well-known consequences. First, since (2.6) implies ||< 
Ci, WB^Wu^u < c x 1 the operator B y : U —>• U' has a finite condition number 

(2.8) ku,u'{B v ) := ||H y ||(7_ > [//||I3“ 1 || E //_ >C / < C\jc\ 

which, in particular, means that residuals in U' are uniformly comparable to errors in U 

(2.9) Cf 1 !!/- B y u\\u' < \\u(y) - u\\u < - B y u\\u>, u£U,y£y. 

Second, by Cea’s Lemma, the Galerkin projection n yj u n onto U n is up to a constant as good 
as the best approximation, i.e., under the assumption (2.6) 

c 

I \u(y) - ILy,u n u(y)\\u < — inf || u(y) - v\\ v . 

c\ veu n 


( 2 . 10 ) 
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(When b y { •,•) is in addition symmetric Ci/ci can be replaced by (Ci/ci) 1 / 2 .) Hence, by (2.91 
and ( 2 . 10 ), 

(2 11) co. tt \ (f,v)-by(Hy, Un u{y),v) 


R(y, U n ) := c 1 sup 

vGU 


m\u 


statisfies more than just ( 2 . 2 ), namely it provides also a uniform lower bound 

( 2 . 12 ) 


Cl 


—R(y,U n ) < ||u(y) - Pu,u n u(y)\\u, y € J 7 - 

t-'l 

Finally, suppose that 6 2/ (-, •) depends affinely on the parameters in the sense that 


M 


(2.13) 


= '^2dk(y)bk(u,v) 1 


k —1 


where the Ok are smooth functions of y £ (V and the bilinear forms &&(•, •) are independent of 
2 /. Then, based on suitable precomputations (in t/,v) in the offline phase, the computation of 
H y> u n u (y) reduces for each y £ y to the solution of a rapidly assembled (n x n)-system, and 
R(y, XJ n ) can indeed be computed very efficiently, see [23] [IBI 121] . 

An essential consequence of (2.2) and (2.12) can be formulated as follows. 


Proposition 2.1. Given U n C U, the function u n+ 1 generated by (2.3) for R(y,U n ) defined 
by (2.11), has the property that 


(2.14) 


||w n+ i - Pu,u n Un+i\\u > 7 w max min \\v - u\\u- 

C i veM u£U n 


Hence, maximizing the residual based surrogate R(y, U n ) (over a suitable discretization of 
(V) is a computationally feasible way of determining, up to a fixed factor 7 := C\/C\ < 1, the 
maximal distance between M and U n and performs in this sense almost as well as the “ideal” 
but computationally infeasible surrogate R*(y, U n ) '■= ||u(y) — Pu,u n u(y)\\u■ 


Proof of Proposition 


2.1 


Suppose that y = ar gmax u gV R (y,U n ), y* := argmax yg;y || u(y) - 


Pu,u n u(y)\\u so that u n+ 1 = u(y). Then, keeping ( 2 . 12 ) and ( | 2 . 10 ) in mind, we have 
||Un+l - Pu,U n Un+l\\u = \W(y) - Pu,U n u( 


u>^R(y,U n )>^-R(y*,U n ) 


GY 


Ci 


> ttII u (y*) - p u,uMy*)\\u = TT max||w(y) - Pu,u n u 
Cl Cl y&y 


where we have used (2.2) in the second but last step. This confirms the claim. 


□ 


Property (2.14) turns out to play a key role in the analysis of the performance of the scheme 
SGA. 


3. Greedy Space Growth 


Proposition 2.1 allows us to view the algorithm SGA as a special instance of the following 
scenario. Given a compact subset /C of a Hilbert space H with inner product (•, •) inducing the 
norm || • || 2 = (■, ■), consider the weak greedy Algorithm [ 2 ] (WGA) below. 

Note that again the choice of u n+ 1 is not necessarily unique and what follows holds for any 
choice satisfying (3.1). 

Greedy strategies have been used in numerous contexts and variants. The current version 
is not to be confused though with the weak orthogonal greedy algorithm introduced in [25) for 
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Algorithm 2 weak greedy algorithm 

1 : function WGA 

2 : Set Ho := {0}, n = 0, uq := 0, fix any 0 < 7 < 1, 

3: given H n , choose some u n + 1 € K. for which 

(3.1) min ||u n - u n+1 || > 7 m ax min \\v - v n \\ =: r ya n {K) H , 

V n £ Hn VG/C V n £ Un 

and set H n+ 1 := H n + span {u n+ i}. 

4: end function 


approximating a function by a linear combination of n terms from a given dictionary. In con¬ 
trast, the scheme WGA described in Algorithm [ 2 ] aims at constructing a (problem dependent) 
dictionary with the aid of a PDE model. While greedy function approximation is naturally 
compared with the best n-term approximation from the underlying dictionary (see [2] 25] for 
related results), a natural question here is to compare the corresponding greedy errors 

cr„(/C)ij := max min ||u — v„|| =: maxdist# (/C, U n ) 

VCiK. VnCiUn 

incurred when approximating a compact set K, with the smallest possible deviation of K, from 
any n-dimensional linear space, given by the Kolmogorov n-widths 

(3.2) d n (JC)H := inf sup inf ||u — u„|| = inf maxdist#(A, V), 

dim V=n yfzfcVntzV dimV=n 

mentioned earlier in the preceding section. One trivially has d n {JC)H < cr n {K,)H for all n £ N 
and the question arises whether there actually exists a constant C such that 

(3.3) a n (IC) H < Cd n (IC) H , n€ N. 


One may doubt such a relation to hold for several reasons. First, orthogonal greedy function 
approximation performs in a way comparable to best n-term approximation only under rather 
strong assumptions on the underlying given dictionary. Intutitively, one expects that errors 
made early on in the iteration are generally hard to correct later although this intuition turns 
out to be misleading in the case of the present set approximation. Second, the spaces U n 
generated by the greedy growth are restricted by being generated only from snapshots in K, 
while the best spaces can be chosen freely, see the related discussion in [ 1 ]. 

The comparison ( |3.3[ ) was addressed first in 0 for the ideal case 7 = 1 . In this case a bound 
of the form a n {K)H < Cn2 n d n {K)H could be established for some absolute constant C. This 
is useful only for cases where the n-widths decay faster than n~ 1 2~ n which indeed turns out to 
be possible for elliptic problems (2.7) with a sufficiently smooth affine parameter dependence 
( |2.13[ ). In fact, in such a case the u(y ) can be even shown to be analytic as a function of y, see 
[ 8 ] and the literature cited there. It was then shown in jj] that the slightly better bound 


(3.4) 


2 n + 1 


n e N, 


holds. More importantly, these bounds cannot be improved in general. Moreover, the possible 
exponential loss in accuracy is not due to the fact the greedy spaces are generated by snapshots 
from 1C. In fact, denoting by d n (IC)H the restricted “inner” widths, obtained by allowing 
only subspaces spanned by snapshots of K. in the competition, one can prove that d n (/C)# < 
nd n {K)H, which is also sharp in general g]. 

While these findings may be interpreted as limiting the use of reduced bases generated in a 
greedy fashion to problems where the ?r-widths decay exponentially fast the situation turns out 
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to be far less dim if one does not insist on a direct comparison of the type (3.31 with n being 
the same on both sides of the inequality. In mm the question is addressed whether a certain 
convergence rate of the n- widths d n {lC)H implies some convergence rate of the greedy errors 
The following result from [¥| gave a first affirmative answer. 


Theorem 3.1. Let 0 < 7 < 1 be the parameter in (3.1) and assume that do(IC)H < M for 
some M > 0. Then 

d n {fc)H < Mn ~ a , neN, 

for some a > 0 , implies 

(3.5) <r n (K) H <CMn~ a , n > 0, 

where C := q^(Aq) a and q := |~2“ +1 7 -1 ] 2 . 

This means that the weak greedy scheme may still be highly profitable even when the n- 
widths do not decay exponentially. Moreover, as expected, the closer the weakness parameter 
7 is to one, the better, which will later guide the sampling strategies for constructing reduced 
bases. 

Results of the above type are not confined to polynomial rates. A sub-exponential decay of 
the d n (IC) h with a rate e~ cn , a < 1 is shown in [4] to imply a rate 


(3.6) 


Vn(,IC) H < C(a, y)e 


= a/( 1 + a), n £ N. 


The principle behind the estimates (3.5), (3.6) is to exploit a “flatness” effect or what one 
may call “conditional delayed comparison”. More precisely, given any 6 £ (0,1) and defining 
q := [ 2(7 9)] 2 , one can show that ([H Lemma 2.2]) 

o n+qrn {JC) H > 0a n (1C) H => cr„(/C) ff < g 1 / 2 d m (/C) ff , n£ N. 

Thus, a comparison between greedy errors and ?r-widths is possible when the greedy errors do 
not decay too quickly. This is behind the diminished exponent a in ( |3.6[ ). 

These results have been improved upon in [13] in several ways employing different techniques 
yielding improved comparisons. Abbreviating a n := a n (IC)H,d n := d n (lC)H , a central result in 
the present general Hilbert space context states that for any N > 0,/\ > 1, 1 < m < K one 
has 

K 

(3.7) 


_ / TC \ fn / K \ K—m 

(^) 


2 (K—m) 


i =1 


As a first important consequence, one derives from these inequalities a nearly direct comparison 
between a n and d n without any constraint on the decay of a n or d n . In fact, taking N = 


0,K = n, and any 1 < m < n in (3.7), using the monotonicity of the a n , one shows that 

< ry-Zn ( 2?A ( _n 

— ' \ml \ n— 


2n 


(3.8) 


dm m from which one deduces 


min d n 

1 <m<n 


n£N. 


This, in particular, gives the direct unconditional comparison 

02n{10)H < n £N. 


The estimate (3.8) is then used in m to improve on ( |3.6| ) establishing the bounds 
(3.9) d n (lC) H <C 0 e- c ° n \ => cr n (IC) H < \/ 2 Co 7 _ 1 e _Cl " Q , n £ N, 
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i.e., the exponent a is preserved by the rate for the greedy errors. Moreover, one can recover 


(3.5) from ( |3.7| ) (with different constants). 

Although not needed in the present context the second group of results in [13] should be 
mentioned that concerns the extension of the weak greedy algorithm WGA to Banach spaces X 
in place of the Hilbert space H. Remarkably, a direct comparison between a n ()C)x and d n (JC)x 

no 


similar to (3.7) is also established in 


The counterpart to (3.8) reads a 2n < 2y 


„-i 


i.e., 

one looses a factor ffn which is shown, however, to be necessary in general. 

All the above results show that the smaller the weakness parameter 7 the stronger the 
derogation of the rate of the greedy errors in comparison with the n-widths. 


4. What are the Right Projections? 


As shown by (3.51 and (3.9), the weak greedy algorithm WGA realizes optimal rates for 


essentially all ranges of interest. A natural question is under which circumstances a surrogate 
based greedy algorithm SGA is in this sense also rate-optimal, namely ensures the validity of 
(3.5) and (3.9). Obviously, this is precisely the case when new snapshots generated through 


maximzing the surrogate have the weak greedy property (3.1). Note that Proposition 2.1 says 


that the residual based surrogate (2.11) in the case of coercive problems does ensure the weak- 


greedy property so that SGA is indeed rate-optimal for coercive problems. Note also that the 
weakness parameter 7 = c\/C\ is in this case the larger the smaller the condition number of 
the operator B y is, see (2.8). Obviously, the key is that the surrogate not only yields an upper 


bound for the best approximation error but also, up to a constant, a lower bound (2.12), and 


the more tightly the best approximation eror is sandwiched by the surrogate the better the 
performance of SGA. Therefore, even if the problem is coercive for a very small 7 = c\/C\ , as 
is the case for convection dominated convection-diffusion problems, in view of the dependence 
of the bounds in (3.51 and (3.9) on 7 -1 , one expects that the performance of a greedy search 


based on (2.11) degrades significantly. 


In summary, as long as algoritm SGA employs a tight surrogate in the sense that 


(4.1) 


CsR(y , U n ) < inf 
v£Un 


\u(y)-v\\u < R(y,U n ), y&y, 


holds for some constant eg > 0, independent of y £ y, algorithm SGA is rate-optimal in the 

i.e., it essentially realizes the n- width rates over all ranges of interest, 
-1 


sense of ( |3.5[ ), ( |3.9[ ) 
see m- We refer to c 


■ s := n. n (R) as the condition of the surrogate R(-,U n ). In the RBM 
community the constant cfl 1 is essentially the stability factor which is usually computed along 
with an approximate reduced solution. Clearly, the bounds in (j3]also show that the quantitative 
performance of SGA is expected to be the better the smaller the condition of the surrogate, 
i.e., the larger eg. 

As shown so far, coercive problems with a small condition number Ku,u'(By) represent an 
ideal setting for RBM and standard Galerkin projection combined with the symmetric surrogate 
(2.11), based on measuring the residual in the dual norm || • ||[// of the “error-norm” || • ||{/, 
identifies rate-optimal snapshots for a greedy space growth. Of course, this marks a small 
segment of relevant problems. Formally, one can still apply these projections and surrogates 


for any variational problem (2.7) for which a residual can be computed. However, in general, 


for indefinite or unsymmetric singularly perturbed problems, the tightness relation (4.1) may 


no longer hold for surrogates of the form (2.11) or, if it holds the condition n n (R) becomes 


prohibitively large. In this latter case, the upper bound of the best approximation error is too 
loose to direct the search for proper snapshots. A simple example is the convection-diffusion 
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problem: for / £ (Hq(Q,))' find £ Hq(SI), fl C R d , such that 

(4.2) e(Vu, Vu) + (b-S7u,v) + (cu,v) =: b y (u,v) = (f,v), v £ Hq(Q), 

where, for instance, y = (e,b) £ y := [e 0 , 1] x S d_1 , S d ~ l the (d — l)-sphere. 


Remark 4.1. It is well nown that when c — 4div& > 0 the problem (4.2) has for any f G 
:= {H£(n))' a unique solution. Thus, for U := Hq(CI) (2.6) is still valid but with 
K u,U'(B y ) ~ £ 1 which becomes arbitrarily large for a correspondingly small diffusion lower 
bound £q . 


The standard scheme SGA indeed no longer performs nearly as well as in the well conditioned 
case. The situation is even less clear when e = 0 (with modified boundary conditions) where no 
“natural” variational formulation suggests itself (we refer to m for a detailed discussion of these 
examples). Moreover, for indefinite problems the Galerkin projection does generally perform 
like the best approximation which also adversily affects tightness of the standard symmetric 
residual based surrogate (2.11). 

Hence, to retain rate-optimality of SGA also for the above mentioned extended range of 
problems one has to find a better surrogate than the one based on the symmetric residual 
bound in (2.11). We indicate in the next section that such better surrogates can indeed be 
obtained at affordable computational cost for a wide range of problems through combining 
Petrov-Galerkin projections with appropriate unsymmetric residual bounds. The approach can 
be viewed as preconditioning the continuous problem already on the infinite dimensional level. 


4.1. Modifying the Variational Formulation. We consider now a wider class of (not nec¬ 
essarily coercive) variational problems 

(4.3) b(u,v) = (f,v), veV, 


where we assume at this point only for each / G V' the existence of a unique solution u G U, 
i.e., the operator B : U -4 V' , induced by bf, •), is bijective. This is well known to be equivalent 
to the validity of 


(4.4) 


inf sup 


b(w, v) 


wew veV ||i/;||[/jp||y 


>P, 


sup sup 

V(zV w€U 


b(w, v) 

IMIc/IMk 


<c b , 


iorv G V3w G W, such that b(w,v) 0, 


for some constants j3, C b . However, one then faces two principal obstructions regarding an RBM 
based on the scheme SGA: 

(a) first, as in the case of (4.2) for small diffusion, ku,v'{B) < C b //3 could be very large so 
that the corresponding error-residual relation 

(4.5) ||u — v\\u < fd~ 1 \\f — Bv\\v, v G U, 


renders a corresponding residual based surrogate ill-conditioned. 

(b) When bf, •) is not coercive, the Galerkin projection does, in general, not perform as well 
as the best approximation. 

The following approach has been used in [10] to address both (a) and (b). The underlying 
basic principle is not new, see [T|, and variants of it have been used for different purposes 
in different contexts such as least squares finite element methods [ISj and, more recently, in 
connection with discontinuous Petrov Galerkin methods HQum]. In the context of RBMs the 
concepts of natural norms goes sort of half way by sticking in the end to Galerkin projections 
[24] . This marks an essential distinction from the approach in [10] discussed later below. 
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The idea is to change the topology of one of the spaces so as to (ideally) make the corre¬ 
sponding induced operator an isometry, see also Following uni, fixing for instance, || • ||y, 
one can define 

(4.6) HI# := SU P 6 |m?^ = \\ Bw \\v', wGU, 

vev IMIv 

which means that one has for Bu = / 

(4.7) \\u-w\\jy = \\f-Bw\\v', WGU, 

a perfect error-residual relation. It also means that, replacing || • ||j/ in 
the inf-sup constant /3 = 1. Alternatively, fixing || • ||;y, one may set 

(4.8) IMIy := sup = \\B*v\\u', v G V, 

weu \\w\\u 

to again arrive at an isometry B : U -A V' , meaning 

(4.9) \\u-w\\u = \\f - Bw\\y,, wGU. 

Whether the norm for U or for V is prescribed depends on the problem at hand and we refer 
to muni si for examples of both type. 

Next note that for any subspace W C U one has 

(4.10) uw — argmin ||it — w \\u = argmin ||/ — Bw\\v', 

w£W wGW 


(4.4) by || • || & , yields 


and analogously for the pair (U,V), i.e., the best approximation in the (7-norm is a minimum 
residual solution in the W-norm. 

To use residuals in V' as surrogates requires fixing a suitable discrete projection for a given 
trial space. In general, in particular when V ^ U, the Galerkin projection is no longer appro¬ 
priate since inf-sup stability of the infinite dimensional problem is no longer inherited by an 
arbitrary pair of finite dimensional trial and test spaces. To see which type of projection would 
be ideal, denote by Rjj : U' -A U the Riesz map defined for any linear functional £ € U' by 


(£,w) = (Ru£,w)u, w G U. 


Then, by (4.6), for any w G W C U, taking v := RyBw € V one has 
b{w,v) = (Bw,v) = (Bw, RyBw) = {Bw, Bw)v' = 
Thus, in particular, 


b{u - u h , R v bw ) = (it - u h , w)jj, 

i.e., given W C U, using V(W) := RyB(W) as a test space in the Petrov-Galerkin scheme 
(4.11) b(u h , v) = (f,v), v€ V(W ) := R V B(W), 


is equivalent to computing the t/-orthogonal projection of the exact solution it of (4.3) and 
hence the best (7-approximation to it. One readily sees that this also means 

b(w, v ) 


(4.12) 


inf sup ’ I, = 1, 

w& w veV (w) IMIcHMIv 


i.e., we have a Petrov-Galerkin scheme for the pair of spaces W, V (IT) with perfect stability 
and the Petrov-Galerkin projection is the best (7-projection. Unfortunately, this is not of 
much help yet, because computing the ideal test space V(W) = RyB{W) = B~*R^ 1 (W) is 
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not numerically feasible. Nevertheless, it provides a useful orientation for finding good and 
practically realisable pairs of trial and test spaces, as explained next. 

4.2. A Saddle Point Formulation. We briefly recall now from puni an approach to deriving 
from the preceding observations a practically feasible numerical scheme which, in particular, fits 


into the context of RBMs. Taking (4.101 as point of departure we notice that the minimization 


of ||/ — Bw\\y over W is a least squares problem whose normal equations read: find uy £ W 
such that (with Ry = Ry 1 ) 

(4.13) 0 =(f-Bu w ,Bw)y = (Ry 1 (f-Bu w ),Bw), w £ W. 

Introducing the auxiliary variable r := Ry 1 {f — Buy) which is equivalent to 


(4.14) 


(R v r,v) = {r,v) v = (f - Bu w ,v), v£V, 


the two relations (4.13) and (4.14) can be rewritten in form of the saddle point problem 
(4.15) 


(r,v) v + b(u w ,v) = (f,v), 

b(w,r ) = 0, 


v £ V. 
w £ W. 


The corresponding inf-sup constant is still one (since the supremum is taken over all of V) and 
(•, -)v is a scalar product so that (4.151 has a unique solution uw , see e.g. [5]. Taking for any 
w £W the test function v = RyBw £ V in the first line of (4.15), one obtains 


(r, v)y = (r, R v Bw)y = (r, Bw) = b(w, r) = 0, 

by the second line in ( 4.15| ) so we see that ( f,RyBw ) = b(uy,RyBw) holds for all w £ W 
which means that uw solves the ideal Petrov-Galerkin problem (4.11). Thus (4.15) is equivalent 


to the ideal Petrov Galerkin scheme (4.11). 


Of course, (4.15) is still not realizable since the space V is still the full infinite dimensional 


space. One more step to arrive at a realizable scheme is the following: given the finite dimen¬ 


sional space W find a finite dimensional space Z C b so that when V in (4.15) is replaced by 
Z, one obtains a stable finite dimensional saddle point problem which is the same as saying 
that its inf-sup constant is safely bounded away from zero. Since Z = V would yield perfect 
stability the choice of Z C V can be viewed as a stabilization. To quantify this we follow m 
and say that for some 6 £ (0,1), Z C V is 5-proximal for W C U if Z is suffciently close to the 
ideal test space V(W) = RyB(W ) in the sense that 

(4.16) ||(J - Py Z )R v Bw\\v < S\\R v Bw\\ v , w£W. 

The related main findings from m can be summarized as follows. 

Theorem 4.2. (i) The pair {uyz, r w,z) £WxZcUxV solves the saddle point problem 

/ 417 n (ry Z ,v)y+ b(uy Z ,v) = (f,v), v£Z, 

^ b(w,Uyz ) =0, w £ W, 

if and only if uy t z solves the Petrov-Galerkin problem 

(4.18) b{u w ,z,v) = (f,v), v £ Py Z (R V B(W)). 


(ii) If Z is 5-proximal for W, (4.18) is solvable and one has 

\\u~uw,z\\u < 

\\u - uw.zWij + \\rw,z\W < vat w€W \\u w ,z ~ w||£. 


(4.19) 


inf wgw || uw,z ~ Hip’ 

I ~ .. inf 
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(iii) Z is 6-proximal for W if and only if 


(4.20) 


inf sup 

wGW vGZ 


b(w , v) 


Ic/Flk 


> a/1 - ^ 2 . 


Note that (4.171 involves ordinary bilinear forms and finite dimensional spaces W, Z and (iii) 
says that the ^-projection of the ideal test space RyB{W) onto Z is a good test space if and 
only if Z is <5-proximal for W. Loosely speaking, Z is large enough to “see” a substantial part 
of the ideal test space RyB(W) under projection. The perhaps most important messages to be 
taken home regarding the RBM context read as follows. 


Remark 4.3. (i) The Petrov-Galerkin scheme (4.18) is realized through the saddlepointproblem 
(4.17) without explicitly computing the test space Py,z{RvB(W)). 


(ii) Moreover, given W, by compactness and (4.16), one can in principle enlarge Z so as to 


make 5 as small as possible, a fact that will be explointed later. 

(iii) The solution component Uw,z a near-best approximation to the exact solution u in the 
U-norm. 


(iv) rw,z can be viewed as a lifted residual which tends to zero in V when W grows and can be 
used for a-posteriori error estimation, see [5] . In the Reduced Basis context this can be exploited 
for certifying the accuracy of the truth solutions and for constructing computationally feasible 
surrogates for the construction of the reduced bases. 


5. The Reduced Basis Construction 


We point out next how to use the preceding results for sampling the solution manifold M 
of a given paramteric family of variational problems: given y £ y , f £ V y , find u(y) £ U y such 
that 


(5.1) 


b y {u(y),v) = (f,v), v&Vy. 


in a way that the corresponding subspaces are rate-optimal. We will always assume that the 


dependence of the bilinear form b v (-, •) on y £ y is affine in the sense of (2.13). 

As indicated by the notation the spaces U y , V y for which the variational problems are well- 
posed in the sense that the induced operator B y : U y —> V y is bijective, could depend on y 
through y-dependent norms. However, to be able to speak of a “solution manifold” At as 
a compact subset of some “reference Hilbert space”, the norms || ■ \\u y should be uniformly 
equivalent to some reference norm || • ||[/ which has to be taken into account when formulating 


(5.1). In fact, under this condition, as shown in m, lor well-posed variational formulations of 
pure transport problems the dependence of the test spaces V y on y £ y is essential, in that 


(5.2) V := p| V v , 

y&y 

is a strict subset of each individual V y . This complicates the construction of a tight surrogate. 
We refer to m for ways of dealing with this obstruction and confine the subsequent discussion 
for simplicity to cases where the test norms || • ||y are also uniformly equivalent to a single 
reference norm || • ||y, see the example later below. 

Under the above assumptions, the findings of the preceding section will be used next to 
contrive a well-conditioned tight surrogate even for non-coercice or severely ill-conditioned 
variational problems which is then in general unsymmetric, i.e., V y ^ U y . These surrogates 
will then be used in SGA. To obtain such a residual based well-conditioned surrogate in the 
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sense of (4.1), we first renorm the pairs of spaces U y or V y according to (4.6) or ( |4.8| ). In 
anticipation of the example below, for definiteness we concentrate on (4.6) and refer to jlOj for 
a discussion of (4.8). As indicated above, we assume further that the norms || • || t 7 , || • \\v y are 
equivalent to reference norms || • H^, || • ||y, respectively. 

5.1. The Strategy. Suppose that we have already constructed a pair of spaces U n C U y , V n C 
V y , y £ y, such that for a given <5 < 1 

M”-.”) yey< 


(5.3) 


inf sup 


weu n veVn 

i.e., V n C V is (5-proximal for U n C U. Thus, by Theorem |4.2| the parametric saddle point 
problem 

{r n (y),v)v v +b v (u n (y),v) = (f,v), v £ V n , 
b(w,r n (y )) =0, w £ W, 

has for each y £ V a unique solution (u n (y),r n (y)) £ U n xV n . By the choice of norms we know 
that 


(5.4) 


(5.5) 
i.e., 

(5.6) 


| u(y) - u r 


= 11 / - ByUr 


R(y, Un X Vn) := 11/ - By 


V', 


y&y, 


y&y 


suggests itself as a surrogate. There are some subtle issues about how to evaluate R(y, U n x V n ) 
in the dual Vjg- of a sufficiently large truth space Fy C V y , y £ y, so as to faithfully reflect errors 
in U,, , not only in the truth space f/y C U y but in 17, and how these quantities are actually 
related to the auxiliary variable ||j" n (j/)||v' which is computed anyway. As indicated before, 
these issues are aggrivated when the norms || • ||y are not all equivalent to a single reference 
norm. We refer to a corresponding detailed discussion in na §5.1] and continue working here 
for simplicity with the idealized version (5.6) and assume its offline feasibility. 

Thus, we can evaluate the errors \\u(y)—u n (y)\\ft and can determine a maximizing parameter 
y n +i for which 


(5.7) 


\u(y n +i) - u n (y n+ i)|U = max ||/ - ByU n (y)\\ V; 


vey 


Now relation (4.19) in Theorem |4.2| tells us that for each y £ y 

<(i -sr 1 


(5.8) 


i(y) - 1 


inf 

WEU-n 


iy) - 


I U y i 


i.e., u n (y) is a near best approximation to u from U n which is, in fact, the nearer to the best 


approximation the smaller <5. By (5.51 and (5.8), the surrogate (5.6) is indeed well-conditioned 


with condition number close to one for small 6. 

A natural strategy is now to enlarge U n to U n +i := U n + span{u(y n +i)}. In fact, this 
complies with the weak gready step (3.1) in )j3]with weakness parameter 7 = (1 — S) as close to 


one as one wishes, when S is chosen accordingly small, provided that the pair of spaces U n , V n 


satisfies (5.3). A repetition would therefore, in principle, be a realization of Algorithm]!] SGA, 
establishing rate-optimality of this RBM. Obviously, the critical condition for such a procedure 
to work is to ensure at each stage the validity of the weak-greedy condition © which in the 
present situation means that the companion space V n is at each stage d-proximal for U n . So far 
we have not explained yet how to grow V n along with U n so as to ensure <5-proximality. This is 
explained in the subsequent section. 
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Remark 5.1. One should note that, due to the possible parameter dependence of the norms 
II ' lie/ > II ' \W V on U> obtaining tight surrogates with the aid of an explicit Petrov-Galerkin for¬ 
mulation, would be infeasible in an RBM context because one would have to recompute the 
corresponding (parameter dependent) test basis for each parameter query which is not online- 
feasible. It is therefore actually crucial to employ the saddle point formulation in the context of 
RBMs since this allows us to determine a space V n of somewhat larger dimension than U n but 
stabilizes the saddle point problem for all y simultaneously. 


5.2. A Greedy Stabilization. A natural option is to enlarge V n by the second component 
r n (y n+ 1) of (5.4). Note though that the lifted resilduals r n tend to zero as n —> oo. Hence, the 


solution manifold of the (y-dependent version of the) saddle point formulation (4.15) has the 
form 

M x {0}, 


where M is the solution manifold of (5.1) (since r(y) = 0 for y £ (V)- Thus, the spaces V n are 
not needed to approximate the solution manifold. Instead the sole purpose of the space V n is 
to guarantee stability. At any rate, the grown pair U n+ i,V n + span{r n (y n+1 )} =: V®, 1 may 


fail to satisfy now (5.3). 


Therefore, in general one has to further enrich TF +1 by additional stabilizing elements again 


in a greedy fashion until (5.3) holds for the desired S. For problems that initially arise as natural 


saddle point problems such as the Stokes system, enrichments by so called supremizers (to be 
defined in a moment) have been proposed already in p3i iI5J [25j. In these cases it is possible to 
enrich V)), A by a fixed a priori known number of such supremizers to guarantee inf-sup stability. 
As shown in m, this is generally possible when using fixed (parameter independent) reference 
norms || • ||^, || • ||y for U and V. For the above more general scope of problems a greedy strategy 
was proposed and analyzed in [TO] , a special case of which is also considered in [T5] without 
analysis. The strategy in [TO] adds only as many stabilizing elements as are actually needed to 
ensure stability and works for a much wider range of problems including singularly perturbed 
ones. In cases where not all parameter dependent norms || • ||y are equivalent such a strategy 
is actually necessary and its convergence analysis is then more involved, see nm- 

To explain the procedure, suppose that after growing U n to U n + 1 we have already generated 
an enrichment Vj( +1 of Vf) +l (which could be, for instance, either Vf) +l := V n +span{r n (y n+ i)} 



(5.9) 


by{w,v) ( 

su p Ti—n—ir^n— = int su p 

v&vf +1 IMIvsIMIt), \ weu n+1 veV f + . 


b y (w,v) 

MklMIr 


If this worst case inf-sup constant does not exceed yet \/l — 5 2 , the current space V(( +1 does 
not contain an effective supremizer for y, w, yet. However, since the truth space satisfies the 


uniform inf-sup condition (5.3) there must exist a good supremizer in the truth space which 
can be seen to be given by 

(5.10) »- 


v = argmax - 

vev B FllH; II wilyA 


providing the next enrichment 

(5.11) V^ 1 :=sp a n{V n k +1 ,v}. 
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We defer some comments on the numerical realization of finding y,v in (5.9), (5.10) to the next 
section. 

This strategy can now be applied recursively until one reaches a satisfactory uniform inf-sup 
condition for the reduced spaces. Again, the termination of this stabilization loop is easily 

fj , || • ||y are uniformly equivalent to reference 


ensured when (2.13) holds and the norms 


norms 


1(7’ 


|y, respectively, but is more involved in the general case [TU]. 


5.3. The Double Greedy Scheme and Main Result. Thus, in summary, to ensure that 


the greedy scheme SGA with the particular surrogate (|5.6|), based on the corresponding outer 
greedy step for extending U n 


to U, 


n+1» 


has the weak greedy property (3.1), one can employ an 
inner stabilizing greedy loop producing a space V n+ i = V£ +1 which is <5-proximal for U n+ \. Here 
k* = k*(S) is the number of enrichment steps needed to guarantee the validity of ( |5.3[ ) for the 
given S. A sketchy version of the corresponding “enriched” SGA, developed in [10j. looks as 
follows: 


Algorithm 3 double greedy algorithm 


function SGA-dou 

Initialize U\,V®, S £ (0,1), target accuracy tol, n <— 1, 
while a n (A4) > tol do 


while U n , V° fail to satisfy (5.3) do 


compute V n with the aid of the inner stabilizing greedy loop, 

end while 

given U n ,V n , satisfying (5.3), compute U n+ 1, V® +1 with the aid of the outer greedy 


step 4, ( |2.3[ ) in algorithm SGA for the surrogate ( |5.6[ ), 
end while 
end function 


As indicated above, both Algorithm]!] SGA and Algorithm[3j SGA-DOU are surrogate based 
greedy algorithms. The essential difference is that for non-coercive problems or problems with 
an originally large variational condition number in SGA-DOU an additional interior greedy 
loop provides a tight well-conditioned (unsymmetric) surrogate which guarantees the desired 
weak greedy property (with weakness constant 7 as close to one as one wishes) needed for 
rate-optimality. 

Of course, the viability of Algorithm SGA-DOU hinges mainly on two questions: 

(a) how to find the worst inf-sup constant in (5.9) and how to compute the supremizer in 


fl5.10| )? 

(b) does the inner greedy loop terminate (early enough)? 

As for (a), it is well known that, fixing bases for U n ,V^, finding the worst inf-sup constant 
amounts to determine for y £ y the cross-Gramian with respect to b y (-,-) and compute its 
smallest singular value. Since these matrices are of size n x (n + k) and hence (presumably) 
of “small” size, a search over y requires solving only problems in the reduced spaces and are 
under the assumption (2.13) therefore offline-feasible. The determination of the corresponding 
supremizer v in (5.10), in turn, is based on the well-known observation that 


b y {w,v) c n - 
argrnax -= R Vg B y w, 

v&v t Mk 
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2 G Vy. 


which is equivalent to solving the Galerkin problem 

= by{w,z), 

Thus, each enrichment step requires one offline-Galerkin solve in the truth space. 

A quantitative answer to question (b) is more involved. We are content here with a few 
related remarks and we refer to a detailed discussion of this issue in jlOj . As mentioned before, 
when all the norms II • ll/y <11 • \\v > V G y, are equivalent to reference norms || • | mil ' \W, 




respectively, the inner loop terminates after at most the number of terms in (2.131. When the 
norms || • ||y are no longer uniformly equivalent to a single reference norm termination is less 
clear. Of course, since all computations are done in a truth space which is finite dimensional, 
compactness guarantees termination after finitely many steps. However, the issue is that the 
number of steps should not depend on the truth space dimension. The reasoning in uni used 
to show that (under mild assumptions) termination happens after a finite number of steps 
which is independent of the truth space dimension, is based on the following fact. Defining 


Un{y) := jw G U n 
(5.12) 


y =1}, solving the problem 

(y,w) := argmax inf \\R Vy B y w - <j>\\ Vy , 
yey;weUi(y) <t>€V* 


when all the || • ||p -norms are equivalent to a single reference norm, can be shown to be 


equivalent to a greedy step of the type (5.9) and can hence again be reduced to similar small 


eigenvalue problems in the reduced space. Note, however, that (5.12) is similar to a greedy 


space growth used in the outer greedy loop and for which some understanding of convergence is 
available. Therefore, successive enrichments based on (5.12) are studied in m regarding their 
convergence. The connection with the inner stabilizing loop based on (5.9) is that 


argmax inf 11 R v . By w - <j> \ \ y s < 5, 
yey-,w&ui( v ) <t>£V* 


just means 


inf || R Vy ByW - <t>\\ Vy < S\\R V B y \\v = <5|M 

06 V£ 


U v > 


w eu n , y g y , 


which is a statement on (5-proximality known to be equivalent to inf-sup stability, see Theorem 
4~2| and fl4T6l). 


A central result from m can be formulated as follows, see nm Theorem 5.5]. 


Theorem 5.2. If (|2.13|) holds and the norms 
norm 


lev l 


| y are all equivalent to a single reference 


U7> i 


V, respectively, and the surrogates (5.6) are used, then the scheme SGA-dou 


is rate optimal, i.e., the greedy errors <r n (M)f, decay at the same rate as the n-widths d n (A4) 


u> 


Recall that the quantitative behavior of the greedy error rates are directly related to those 
of the n- widths by 7 -1 = Cg 1 , see Theorem 3.1 This suggests that a fast decay of d n {M)yy is 


reflected by the corresponding greedy errors already for moderate values of n which is in the 
very interest of reduced order modeling. This will be confirmed by the expamples below. In 
this context an important feature of SGA-DOU is that through the choice of the d-proximility 
parameter the weakness parameter 7 can be driven towards one, of course, at the expense 
of somewhat larger spaces V n+ \. Hence, stability constants close to one are built into the 
method. This is to be contrasted by the conventional use of SGA based on surrogates that are 
not ensured to be well conditioned and for which the computation of the certifying stability 
constants tends to be computationally expensive. 
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5.4. A Numerical Example. The preceding theoretical results are illustrated next by a nu¬ 
merical example that brings out some of the main features of the scheme. While the double 
greedy scheme applies to non-coercive or indefinite problems (e.g. see [TDj for pure trans¬ 
port) we focus here on a classical singularly perturbed problem because it addresses also some 
principal issues for RBMs regarding problems with small scales. Specifically, we consider the 
convection-diffusion problem (4.2) on = (0, l) 2 for a simple parameter dependent convection 
held 


b{y) := 


cos y\ 
sin y) 


y&[ 0,2tt), c = 1, 


keeping for simplicity the diffusion level e fixed but allowing it to be arbitrarily small. All 
considerations apply as well to variable and parameter dependent diffusion with any arbitrarily 
small but strictly positive lower bound. The “transition” to a pure transport problem is dis¬ 
cussed in detail in mm- Parameter dependent convection directions mark actually the more 
difficult case and are, for instance, of interest with regard to kinetic models. 

Let us first briefly recall the main challenges posed by (4.2) for very small diffusion e. The 
problem becomes obviously dominantly unsymmetric and singularly perturbed. Recall that for 
each positive e the problem possesses for each y £ y a unique solution u(y) in U = H { \ (Q) 
that has a zero trace on the boundary dfl. However, as indicated earlier, the condition number 
Ku,u'(By) of the underlying convection-diffusion operator B y , viewed as an operator from U = 
onto U' = H~ 1 ( f2), behaves like e" 1 , that is, it becomes increasingly ill conditioned. 
This has well known consequences for the performance of numerical solvers but above all for 
the stability of corresponding discretizations. 

We emphasize that the conventional mesh dependent stabilizations like SUPG (cf. [TTJ) do 
not offer a definitive remedy because the corresponding condition, although improved, remains 
very large for very small e. In [193 SUPG-stabilization for the offline truth calculations as well as 
for the low-dimensional online Galerkin projections are discussed for moderate Peclet-numbers 
of the order of up to 10 3 . In particular, comparisons are presented when only the offline phase 
uses stabilization while the un-stabilized bilinear form is used in the online phase, see also the 
references in m for further related work. 

As indicated earlier, we also remark in passing that the singularly perturbed nature of the 
problem poses an additional difficulty concerning the choice of the truth space Ujy. In fact, 
when e becomes very small one may not be able to afford resolving correspondingly thin layers 
in the truth space which increases the difficulty of capturing essential features of the solution 
by the reduced model. 

This problem is addressed in m by resorting to a weak formulation that does not use Hq (II) 
(or a renormed version of it) as a trial space but builds on the results from [7]. A central idea 
is to enforce the boundary conditions on the outflow boundary T + ( y ) only weakly. Here T + ( y) 
is that portion of dfl for which the inner product of the outward normal and the convection 
direction is positive. Thus, solutions are initially sought in the larger space =: 

U-(y) enforcing homogeneous boundary conditions only on the inflow boundary T_(y). Since 
the outflow boundary, and hence also the inflow boundary depend on the parameter y , this 
requires subdividing the parameter set into smaller sectors, here four, for which the outflow 
boundary T + = r + (y) remains unchanged. We refer in what follows for simplicity to one such 
sector denoted again by y. 
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Figure 5.1. left: e = 2 5 ,n = 6,ny = 13; middle: e = 2 7 ,n = 7,ny = 
20; right: e = 2 -26 , n = 20, ny = 57. 


The following prescription of the test space falls into the category (4.6) where the norm for 
U is adapted. Specifically, choosing 

1 


s y (u,v) := ~({B y u,v)+ {B y v,u)), 

\\ v \\v y ■= s y {v,v) =c\v\ 2 HHn) + (c-i 


' ~ 2 div 


1/2 2 
V 


L 2 (Q) 


in combination with a boundary penalization on T + , we follow [71128] and define 


u ll U; — \\B v u 


y u \\v' 


J y a IIV' 




1 / 2 — — 

where Hb(y) = H 0 q (T + (y)), V y := V y x Ht,(y)' and B y denotes the operator induced by this 
weak formulation over U y := H ^ r ^ (0) x Hb(y). The corresponding variational formulation 


is of minimum residual type (cf. (4.10)) and reads 


(5.13) 


u(y ) = sigmm w&u _ {y) {\\B y w - f\\^, +X\\w\\ 2 Hb{y) }. 


One can show that its (infinite dimensional) solution, whenever being sufficiently regular, solves 
also the strong form of the convection diffusion problem (4.2). Figure 5.1 illustrates the effect 


of this formulation where we set n = dim U n , ny ■= dim V n 

The shaded planes shown in Figure [5T] indicate the convection direction for which the snap¬ 
shot is taken. For moderately large diffusion the boundary layer at r + is resolved by the truth 
space discretization and the boundary conditions at the outflow boundary are satisfied exactly. 
For smaller diffusion in the middle example the truth space discretization can no longer resolve 
the boundary layer and for very small diffusion (right) the solution is close to the one for pure 


transport. The rationale of (5.13) is that all norms commonly used for convection diffusion 


equations resemble the one chosen here, for instance in the form of a mesh dependent “broken 
norm”, which means that most part of the incurred error of an approximation is concentrated in 
the layer region, see e.g. EH US. Hence, when the layers are not resolved by the discretization, 
enforcing the boundary conditions does not improve accuracy and, on the contrary, may degrade 
accuracy away from the layer by causing oscillations. The present formulation instead avoids 
any non-physical oscillations and enhances accuracy in those parts of the domain where this is 
possible for the afforded discretization, see dm™ for a detailed discussion. The following 
table quantifies the results for the case of small diffusion e = 2 -26 and a truth discretization 
whose a posteriori error bound is 0.002. 
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Figure 5.2. Convection-diffusion equation, e = 2 26 , maximal a-posteriori 
error 0.00208994 


Table 1. Convection-diffusion equation, e = 2 26 , maximal a-posteriori error 
0.00208994 


n 

Uy 

8 

surrogate 

surr/a-post 

n 

Tly 

8 

surrogate 

surr/a-post 

2 

5 

1.36e-03 

2.10e-01 

1.01e+02 

14 

39 

1.17e-04 

8.15e-03 

3.90e+00 

4 

9 

1.10e-02 

7.51e-02 

3.59e+01 

16 

45 

9.79e-05 

7.56e-03 

3.62e+00 

6 

15 

1.75e-03 

4.95e-02 

2.37e+01 

18 

51 

6.32e-05 

7.40e-03 

3.54e+00 

8 

21 

9.16e-04 

2.34e-02 

1.12e+01 

20 

57 

4.74e-05 

6.09e-03 

2.92e+00 

10 

27 

3.65e-04 

2.05e-02 

9.82e+00 

22 

63 

2.36e-05 

5.43e-03 

2.60e+00 

12 

33 

3.34e-04 

1.56e-02 

7.45e+00 

24 

65 

2.36e-05 

4.73e-03 

2.27e+00 


The columns 3 and 8 show the 8 governing the condition of the saddle point problems (and 
hence of the corresponding Petrov-Galerkin problems), see (5.3), the greedy space growth is 
based upon. Hence the surrogates are very tight giving rise to weakness parameters very close 
to one. As indicated in Remark |4.3| one can use also an a posteriori bound for the truth 
solution based on the corresponding lifted residual. Columns 5 and 10 show therefore the 
relative accuracy of the current reduced model and the truth model. This corresponds to the 
stability constants computed by conventional RBMs. Even for elliptic problems these latter 
ones are significantly larger than the ones for the present singularly perturbed problem which 
are guaranteed to be close to one by the method itself. Based on the a posteriori bounds for 
the truth solution (which are also obtained with the aid of tailored well-conditioned variational 
formulations, see E), the greedy space growth is stopped when the surrogates reach the order of 
the truth accuracy. As illustrated in Figure [572} in the present example this is essentially already 
the case for < 20 trial reduced basis functions and almost three times as many test functions. To 
show this “saturation effect” we have continued the space growth formally up to n = 24 showing 
no further significant improvement which is in agreement with the resolution provided by the 
truth space. These relations agree with the theoretical predictions in [TO] . Figure [A2] illustrates 
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also the rapid gain of accuracy by the first few reduced basis functions which supports the fact 
that the solution manifold is “well seen” by the Petrov-Galerkin surrogates. More extensive 
numerical tests shown in pTO] show that the achieved stability is independent of the diffusion 
but the larger the diffusion the smaller become the dimensions n = dim U n , ny = diml/jj for the 
reduced spaces. This indicates the expected fact that the larger the diffusion the smoother is 
the dependence of u(y) on the parameter y. In fact, when e —> 0 one approaches the regime of 
pure transport where the smoothness of the parameter dependence is merely Holder continuity 
requiring for a given target accuracy a larger number of reduced basis functions, see jlOj . 


6. Is it Necessary to Resolve All of Ml 


The central focus of the preceding discussion has been to control the maximal deviation 

(6.1) <J n (M)u = max\\u(y) - P UtUn u(y)\\u, 

y&y 

and to push this deviation below a given tolerance for n as small as possible. However, in many 
applications one is not interested in the whole solution field but only in a quantity of interest 
I{y), typically of the form I(y) = £(u(y)) where £ G U' is a bounded linear functional. Looking 
then for some desired optimal state I* = £{u{y*)) one is interested in a guarantee of the form 

(6.2) \£{u n (y)) - £(u(y))\ < tol, y Gy, 

where the states u n ( y ) belong to a possibly small reduced space U n in order to be then able to 
carry out the optimization over y G y in the small space U n C U. Asking only for the values 
of just a linear functional of the solution seems to be much less demanding than asking for the 
whole solution and one wonders whether this can be exploited in favor of even better online 
efficiency. 

Trying to reduce computational complexity by exploiting the fact that, retrieving only a 
linear functional of an unknown state - a scalar quantity - may require less information than 
recovering the whole state, is the central theme of goal oriented adaptation in finite element 
methods, see [EJ. Often the desired accuracy is indeed observed to be reached by significantly 
coarser discretizations than needed to approximate the whole solution within a corresponding 
accuracy. The underlying effect, sometimes referred to as “squared accuracy” is well understood 
and exploited in the RBM context as well, see usual. We briefly sketch the main ideas for 
the current larger scope of problems and point out that, nevertheless, a guarantee of the form 


First, a trivial estimate gives for i G U' 


(6.2) ultimately requires controlling the maximal deviation of a reduced space in the sense of 
(6.1). Hence, an optimal sampling of a solution manifold remains crucial. 


(6.3) 


I £{u n (y)) - £{u{y ))| < \\£\\u’\\u n {_y) - u(y)\\u 


so that a control of a n (M)u would indeed yield a guarantee. However, the n needed to drive 
\\£\\u'&n(M)u below tol is usually larger than necessary. 

To explain the principle of improving on (6.3) we consider again a variational problem of 


the form ( |4.3| ) (suppressing any parameter dependence for a moment) for a pair of spaces U, V 
where we assume now that Kuy{B) < Cb/cb is already small, possibly after renorming an 
initial less favorable formulation through (4.6) or (4.8). Let u G U again denote the exact 


solution of (4.3). Given a £ G U' we wish to approximate £{u), using an approximate solution 


u G W C U defined by 
(6.4) 


b(u,v) = (f,v), vGV(W)cV, 
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where V(W) is a suitable test space generated by the methods discussed in [4.1 In addition 
we will use the solution z £ V of the dual problem: 

(6.5) b(w, z) = —£(w), w £ U, 
together with an approximation z £ Z C V defined by 

(6.6) b(w,z) = —£(w), w£W(Z)cU , 

again with a suitable test space W(Z). Recall that we need not determine the test spaces 
V(W), W(Z) explicitly but rather realize the corresponding Petrov-Galerkin projections through 
the equivalent saddle point formulations with suitable (5-proximal auxiliary spaces generated by 
a greedy stabilization. 

Then, defining the primal residual functional 

(6.7) ra(v) := r(u, v) := b(u - u, v) = (/, v) - b(u, v), 

and adapting the ideas in mm for the symmetric case V = U to the present slightly more 
general setting, we claim that 

(6.8) £(u) := £{u) — r(u,z) 
is an approximation to the true value £(u) satisfying 


(6.9) 


\i{u) - £{u )| < C inf ||u - w\\u inf | 

w£W v£Z 


\z - v\\ v , 


where C depends only on the inf-sup constant of the finite dimensional problems. In fact, since 
by ( |R5t , 

£(u) — £(u) = b(u — u,z) = —r{u , z), 
one has £{u) = £{u ) — r(u, z) and hence 

\£{u) — £{u )| = \£{u) — r(u , z) — £{u) + r(u , z)\ = \r(u, z — z)\ = | b(u — u, z — z)\ 

< C b \\u-u\\u\\z- z\\ v , 

which confirms the claim since u, z are near-best approximations due to the asserted inf-sup 
stability of finite dimensional problems. 

Clearly, (6.9) says that in order to approximate £(u) the primal approximation in U need 
not resolve u at all as long as the dual solution z is approximated well enough. Moreover, when 
£ is a local functional, e.g. a local average approximating a point evaluation, z is close to the 
corresponding Green’s function with (near) singularity in the support of £. In the elliptic case z 
would be very smooth away from the support of £ and hence well approximate by a relatively 
small number of degrees of freedom concentrated around the support of £. Thus, it may very 
well be more profitable to spend less effort on approximating u than on approximating z. 

Returning to parameter dependent problems (5.1|, the methods in ^5] can now be used 
as follows to construct possibly small reduced spaces for a frequent online evaluation of the 
quantities I(y) = £(u(y)). We assume that we already have properly renormed families of 
norms || • ||j/ , || • ||y , y £ y, with uniform inf-sup constants close to one. We also assume 
now that both families of norms are equivalent (by compactness of y uniformly equivalent) to 
reference norms II • IIe/> II • Ik) respectively. Hence we can consider two solution manifolds 


M nr := {u(y) = B-'f, y £ y} C U, M dual := {z(y) := B ~*£, y £ y} C V, 


L pr • i J i tf ^ <-r J ^ i * ’ i auai • i ~\yj • -‘-'y 

and use Algorithm [3j SGA-DOU to generate (essentially in parallel) two sequences of pairs of 
reduced spaces 


(U n ,v n ), (Z„,W n ), n£ N. 
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Here V n C V, W n C U are suitable stabilizing spaces such that for m < n and for the corre¬ 
sponding reduced solutions u m (y) £ U m , z n - m (y) £ Z n _ m the quantity 

(6.10) In,m(y) ■= i(u m (y)) - r(u m (y ), 2„_ m (y)) 

satisfies 


( 6 . 11 ) 


I Kv) - In,m (2/)| < C<J m (M pT )uCrn-m{Mdua.l)v, 


with a constant C independent of n, m. The choice of m < n determines how to distribute the 
computational effort for computing the two sequences of reduced bases and their stabilizing 
companion spaces. By Theorem 5.2 one can see that whichever n-width rate d n (M pr )u or 
d n (A4 duai)v decays faster one can choose m < n to achieve for a total of dim U rn + dim Z n _ rn = 
n the smallest error bound. Of course, the rates are not known and one can use the tight 
surrogates to bound and estimate the respective errors very accurately. For instance, when 
d n (M pr )u < Cn~ a , d n [Mdna\)v < C'n _/3 , m = yields an optimal distribution with 

a bound 


( 6 . 12 ) 


|/(S) - W#)l < c(^)"(^) V<"+« 


In particular, when /3 > a the dimensions on the reduced bases for the dual problem should 
be somewhat larger but essentially using the same dimensions for the primal and dual reduced 
spaces yields the rate n~^ a+ ^ confirming the “squaring” when a = /3. In contrast, as soon as 
either one of the n-width rates decays exponentially it is best to grow only the reduced spaces 
for the faster decay while keeping a fixed space for the other side. 


7. Summary 

We have reviewed recent developments concerning reduced basis methods with the following 
main focus. Using Kolmogorov n-width as a benchmark for the performance of reduced basis 
methods in terms of minimizing the dimensions of the reduced models for a given target accu¬ 
racy, we have shown that this requires essentially to construct tight well-conditioned surrogates 
for the underlying variational problem. We have explained how renormation in combination 
with inner stabilization loops can be used to derive such residual based surrogates even for 
problem classes not covered by conventional schemes. This includes in a fully robust way in¬ 
definite as well as ill-conditioned (singularly perturbed) coercive problems. Greedy strategies 
based on such surrogates are then shown to constitute an optimal sampling strategy, i.e., the 
resulting snapshots span reduced spaces whose distances from the solution manifold decay es¬ 
sentially at the same rate as the Kolmogorov n-widths. This means, in particular, that stability 
constants need not be determined by additional typically expensive computations but can be 
pushed by the stabilizing inner greedy loop as close to one as one wishes. Finally, we have 
explained why the focus on uniform approximation of the entire solution manifold is equally 
relevant for applications where only functionals of the parameter dependent solutions have to 
be approximated. 
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